30 research outputs found

    Scalable Query Processing on Spatial Networks

    Get PDF
    Spatial networks (e.g., road networks) are general graphs with spatial information (e.g., latitude/longitude) information associated with the vertices and/or the edges of the graph. Techniques are presented for query processing on spatial networks that are based on the observed coherence between the spatial positions of the vertices and the shortest paths between them. This facilitates aggregation of the vertices into coherent regions that share vertices on the shortest paths between them. Using this observation, a framework, termed SILC, is introduced that precomputes and compactly encodes the N^2 shortest path and network distances between every pair of vertices on a spatial network containing N vertices. The compactness of the shortest paths from source vertex V is achieved by partitioning the destination vertices into subsets based on the identity of the first edge to them from V. The spatial coherence of these subsets is captured by using a quadtree representation whose dimension-reducing property enables the storage requirements of each subset to be reduced to be proportional to the perimeter of the spatially coherent regions, instead of to the number of vertices in the spatial network. In particular, experiments on a number of large road networks as well as a theoretical analysis have shown that the total storage for the shortest paths has been reduced from O(N^3) to O(N^1.5). In addition to SILC, another framework, termed PCP, is proposed that also takes advantage of the spatial coherence of the source vertices and makes use of the Well Separated Pair decomposition to further reduce the storage, under suitably defined conditions, to O(N). Using these frameworks, scalable algorithms are presented to implement a wide variety of operations such as nearest neighbor finding and distance joins on large datasets of locations residing on a spatial network. These frameworks essentially decouple the process of computing shortest paths from that of spatial query processing as well as also decouple the domain of the participating objects from the domain of the vertices of the spatial network. This means that as long as the spatial network is unchanged, the algorithm and underlying representation of the shortest paths in the spatial network can be used with different sets of objects

    Towards a Workload for Evolutionary Analytics

    Full text link
    Emerging data analysis involves the ingestion and exploration of new data sets, application of complex functions, and frequent query revisions based on observing prior query answers. We call this new type of analysis evolutionary analytics and identify its properties. This type of analysis is not well represented by current benchmark workloads. In this paper, we present a workload and identify several metrics to test system support for evolutionary analytics. Along with our metrics, we present methodologies for running the workload that capture this analytical scenario.Comment: 10 page

    Accessing Diverse Geo-Referenced Data Sources with the SAND Spatial

    No full text
    The Internet has become the most frequently accessed medium for obtaining various types of data. In particular, government agencies, academic institutions, and private enterprises have published gigabytes of geo-referenced data on the Web. However, to obtain geo-referenced data from the Web successfully, systems must be designed to be capable of understanding the data sets published in different data formats. Also, even if the data sets are available in a simple known format, they often have poorly defined structures. With these issues in mind, we have developed an Internet-enabled data collection and conversion utility that interfaces with our prototype spatial database system, SAND. Using this utility, data can be retrieved from many different sources on the Web and converted into a format understandable by the SAND spatial database management system. Our collection and conversion utility is able to import the most popular data formats; namely, ESRI Shapefiles, Microsoft Excel files, HTML files, and GML files. Data in unstructured formats are verified for correct selection of the data types and handling of missing tuples before the insertion operation into the database. Moreover, our utility makes it possible to download any nonspatial data set and combine it internally with a relevant spatial data set. These features are accessible through a spreadsheet-like interface for online editing and structuring of data

    Distance Oracles for Spatial Networks

    No full text
    Abstract — The popularity of location-based services and the need to do real-time processing on them has led to an interest in performing queries on transportation networks, such as finding shortest paths and finding nearest neighbors. The challenge is that these operations involve the computation of distance along a spatial network rather than “as the crow flies. ” In many applications an estimate of the distance is sufficient, which can be achieved by use of an oracle. An approximate distance oracle is proposed for spatial networks that exploits the coherence between the spatial position of vertices and the network distance between them. Using this observation, a distance oracle is introduced that is able to obtain the ε-approximate network distance between two vertices of the spatial network. The network distance between every pair of vertices in the spatial network is efficiently represented by adapting the well-separated pair technique to spatial networks. Initially, use is made of an ε-approximate distance oracle of size O ( n εd) that is capable of retrieving the approximate network distance in O(logn) time using a B-tree. The retrieval time can be theoretically reduced to O(1) time by proposing another ε-approximate distance oracle of size O ( nlogn εd) that uses a hash table. Experimental results indicate that the proposed technique is scalable and can be applied to sufficiently large road networks. A 10%-approximate oracle (ε = 0.1) on a large network yielded an average error of 0.9 % with 90 % of the answers making an error of 2 % or less and an average retrieval time of 68µ seconds. Finally, a strategy for the integration of the distance oracle into any relational database system as well as using it to perform a variety of spatial queries such as region search, k-nearest neighbor search, and spatial joins on spatial networks is discussed. I

    Efficient query processing on spatial networks

    No full text
    A framework for determining the shortest path and the distance between every pair of vertices on a spatial network is presented. The framework, termed SILC, uses path coherence between the shortest path and the spatial positions of vertices on the spatial network, thereby, resulting in an encoding that is compact in representation and fast in path and distance retrievals. Using this framework, a wide variety of spatial queries such as incremental nearest neighbor searches and spatial distance joins can be shown to work on datasets of locations residing on a spatial network of sufficiently large size. The suggested framework is suitable for both main memory and disk-resident datasets. Categories and Subject Descriptor

    Identification of live news events using Twitter

    No full text
    Twitter presents a source of information that cannot easily be obtained anywhere else. However, though many posts on Twitter reveal up-to-the-minute information about events in the world or interesting sentiments, far more posts are of no interest to the general audience. A method to determine which Twitter users are posting reliable information and which posts are interesting is presented. Using this information a search through a large, online news corpus is conducted to discover future events before they occur along with information about the location of the event. These events can be identified with a high degree of accuracy by verifying that an event found in one news article is found in other similar news articles, since any event interesting to a general audience will likely have more than one news story written about it. Twitter posts near the time of the event can then be identified as interesting if they match the event in terms of keywords or location. This method enables the discovery of interesting posts about current and future events and helps in the identification of reliable users
    corecore